Atari Mega Archive 1

home *** CD-ROM | disk | FTP | other *** search

/ Atari Mega Archive 1 / Atari Mega Archive - Volume 1.iso / lists / mint / l_0399 / 177 < prev next >

Wrap

Internet Message Format | 1994-08-27 | 8.2 KB

Date: 23 Mar 1993 17:41:02 -0500 (EST) From: ianl@bix.com Subject: RE: wanted: ARGV standard extension In-Reply-To: <9303231044.AA02211@irz405.inf.tu-dresden.de> To: hohmuth@freia.sax.de Message-Id: <9303231741.memo.68475@BIX.com> Tue, 23 Mar 1993 17:41:02 -0500 (EST) X-Cosy-To: hohmuth@freia.inf.tu-dresden.de I vaguely remember the prior discussions on passing empty args. That was right before I did my temporary drop-out from the usenet scene. I also remember the method you outlined as being one of the most robust. My only real objection to it was the complexity of the code to implement it. Call me lazy if you will, but after 22 years of programming, I put a lot of stock in the idea of long design leading to the simplest possible code. Not that the method is a nightmare of coding by any means, but it does mean the sender of the args has to make a couple passes of the data before it can begin writing the args to the environment area (it has to find the empty args first, since the way of expressing them in list form requires a variable amount of up-front space). On the receiving end, a process of tokenizing and ascii->binary conversion is needed. I picture the need for something like an is_in_null_list(argnum) function that scans the ascii ARGV= string, tokenizing and converting ascii->binary as it goes, and this will have to be called for any arg that starts with a space. The empty list can't be easily binary'd once without setting arbitrary limits on its size or using dynamic memory allocation. (IE, it looks like a lot of the runtime library might get sucked into every program just so that ARGV args can be processed.) Runtime performance is a secondary consideration. Now that I've grown used to having ARGV support around, I've also grown used to abusing it in makefiles, especially by doing things like passing 400 object modules names to AR on a single command line, and so on. I don't like the idea of making two passes of 400 args if it can be avoided. As I remember it, the last thing I proposed on usenet was a simple escaping mechanism which was neither embraced nor definitively shot down by presenting a situation in which it failed catastrophically. Let me see if I can recall it and present it again in an organized fashion... First, let's consider non-ARGV schemes. xArgs already deals with empty args; anything we do doesn't affect it. Technically, the basepage image of the command line also allows empty args. The rule is that the string is terminated by count, not contents. A \0 in the basepage can signal an empty arg without any problems, according to the standard. In reality, many implementations use the count byte to place a \0 at the end of the string, then use strcpy(), strtok(), and similiar tools to process the image. An embedded \0 would break these things. However, I think the way they'd break is pretty safe -- the program will most likely see fewer args than it expects, and will thus whine and die. It isn't likely that the program will break in catastrophic or data-damaging ways. It will also be pretty simple to change existing routines that parse the basepage image to be driven by count rather than using strtok() et. al. That leaves ARGV. My escaping mechansim can be summed up in one sentence: If the first character of any arg is less than or equal to \1, that arg is prefixed with an extra \1. On the arg-sender's side, this is implemented as the data is being written to the environment data area. It examines the first char of each arg string as it is being copied to the env area. If the char is <=1, it outputs a \1, followed by the rest of the arg string. On the receiving end, this is implemented as the argv[] array is being created. The first char of each string is examined, and if it is \1, the pointer placed in argv[] is incremented by one, so that it points to the second char of the arg. An empty arg is represented in the env data as \1\0. The \1 is skipped by the receiver, meaning that the pointer in argv[] will point to the \0. An arg of \1 is represented in the env data as \1\1\0. The first \1 is skipped, the pointer in argv[] will point to \1\0. An arg of \1\2\3 is represented as \1\1\2\3\0; the pointer in argv[] will be to the second \1. If the arg is non-empty and first char is not \1, neither the sender nor the receiver takes any special action, it works just as it does now. This strikes me as a general solution that doesn't require multiple passes of the data on the sending side, or tricky parsing on the receiving side. That leaves the issue of how unaware programs will behave, and I'll admit that's the part I've given the least thought to in this scheme. I'll brainstorm on the fly here, and rely on the fact that y'all may spot problems that don't occur to me. First let's consider an aware sender and an unaware receiver. For an empty arg, the aware sender passes \1\0, and the receiver sees exactly that. It will probably react badly to the \1, but probably not any worse than it would react to a space, I think. Neither is a valid filename, and should result in an error message. I don't know what other use a program might make of empty args. A program such as tr would translate all occurances of \1 instead of the \0 chars you might have had in mind. But right now such a program can't translate the \0 chars anyway, there's no way to even ask it to. A similar problem arises with trying to pass an arg of \1. The aware sender passes \1\1\0, and the receiver might be a bit confused by getting two chars where it expected one. But I don't see this as a leading to catastrophic data loss either. In truth, one of the reasons I like the idea of a \1 as an escape is because it strikes me as a char that doesn't often show up in args now, and one that is likely to lead to a controlled failure of a program that receives it unexpectedly (because it isn't anything like a valid filename or option). It might be a valid char to a program that searches for or translates string of characters in a file, but it shouldn't show up often in such contexts, and should at worst lead to the program not finding the strings in the file because of the extra \1. Now let's consider an unaware sender and an aware receiver. I don't know what an unware sender is likely to do with an empty arg. If the sender just puts the \0 into the env area, you end up with \0\0, prematurely terminating the args, which is just what happens right now anyway, no change there. The real problem here is that if an arg starts with \1, the aware receiver is going to skip that char, causing a possible screw-up in the receiver's behavior, because it's skipping something that isn't validly a prefix char. I'm tempted to say "so what, it isn't a situation that comes up often enough to worry about, as per the discussion above on how rare leading \1 chars in args are." But, if there's a feeling that we should care about this just for completeness' sake, then what we need is some extra validation that an aware receiver can use to determine whether the sender is aware. In that case, we can resort to a simple marker passed as the value following the ARGV= part of the env. We need only be careful that we choose a marker than can't happen in MWC's current use of the ARGV= value. I forget the details of MWC's use of that string, but I'll bet something as simple as ARGV=ARGV2\0 would do the trick. Then the receiver need only verify the presence of the ARGV2 string, and use that as its key on whether to skip leading \1 chars in the args. Well, that's my idea and my thoughts on it. I'll be happy to implement (in HSC's runtime library) any reasonable scheme that everyone agrees on. Frankly, with my current worries over the proper definition of the GEM programming interface, I won't have a lot of energy to spare in lobbying hard for this ARGV scheme. In both cases, it's accomodating the widest range of people that interests me most, but since I do more GEM programming than CLI-related stuff, that's where most of my energy will be going. Feel free to redistribute this reply to your mailing list for comments, or to post it publicly if you feel that's the best forum for feedback on it. - Ian ianl@bix.com ilepore@nyx.cs.du.edu (which just gets forwarded to me on bix anyway)